Modular Learning Systems for Behavior Acquisition in Multi-Agent Environment
نویسندگان
چکیده
There has been a great deal of research on reinforcement learning in multirobot/agent environments during last decades1. A wide range of applications, such as forage robots (Mataric, 1997), soccer playing robots (Asada et al., 1996), prey-pursuing robots (Fujii et al., 1998) and so on, have been investigated. However, a straightforward application of the simple reinforcement learning method to multi-robot dynamic systems has a lot of issues, such as uncertainty caused by others, distributed control, partial observability of internal states of others, asynchronous action taking, and so on. In this paper we mainly focus on two major difficulties in practical use : unstable dynamics caused by policy alternation of other agents curse of dimension problem The policy alternation of others in multi-agent environments may cause sudden changes in state transition probabilities of which constancy is needed for behavior learning to converge. Asada et al. (Asada et al., 1999) proposed a method that sets a global learning schedule in which only one agent is specified as a learner with the rest of the agents having fixed policies to avoid the issue of the simultaneous learning. As a matter of course, they did not consider the alternation of the opponent’s policies. Ikenoue et al. (Ikenoue et al., 2002) showed simultaneous cooperative behavior acquisition by fixing learners’ policies for a certain period during the learning process. In the case of cooperative behavior acquisition, no agent has any reason to change policies while they continue to acquire positive rewards as a result of their cooperative behavior with each other. The agents update their policies gradually so that the state transition probabilities can be regarded as almost fixed from the viewpoint of the other learning agents. Kuhlmann and Stone (Kuhlmann and Stone, 2004) have applied a reinforcement learning system with a function approximator to the keepaway problem in the situation of the RoboCup simulation league. In their work, only the passer learns his policy is to keep the ball away from the opponents. The other agents (receivers and opponents) follow fixed policies given by the designer beforehand. The amount of information to be handled in multi-agent system tends to be huge and easily causes the curse of dimension problem. Elfwing et al. (Elfwing et al., 2004) achieved the cooperative behavior learning task between two robots in real time by introducing the
منابع مشابه
Modular Learning System and Scheduling for Behavior Acquisition in Multi-agent Environment
The existing reinforcement learning approaches have been suffering from the policy alternation of others in multiagent dynamic environments such as RoboCup competitions since other agent behaviors may cause sudden changes of state transition probabilities of which constancy is necessary for the learning to converge. A modular learning approach would be able to solve this problem if a learning a...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملModular Learning Systems for Soccer Robot
This paper presents a series of the studies of modular learning system for vision-based behavior acquisition of a soccer robot participating in middle size league of RoboCup (Asada, et al. 1999). Reinforcement learning has recently been receiving increased attention as a method for behavior learning with little or no a priori knowledge and higher capability of reactive and adaptive behaviors. H...
متن کاملModular Q-learning based multi-agent cooperation for robot soccer
In a multi-agent system, action selection is important for the cooperation and coordination among agents. As the environment is dynamic and complex, modular Q-learning, which is one of the reinforcement learning schemes, is employed in assigning a proper action to an agent in the multi-agent system. The architecture of modular Q-learning consists of learning modules and a mediator module. The m...
متن کاملModular Bayesian Inference and Learning of Decision Networks as Stand-alone Mechanisms of the Mabel Model: Implications for Visualization, Comprehension, and Policy Making
This paper describes a modular component of the MABEL model agents’ cognitive inference mechanism. The probabilistic and probabilogic representation of the agents’ environment and state space is coupled with a Bayesian belief and decision network functionality, which in fact holds Markovian semiparametric properties. Different approaches to modeling multi-agent systems are described and analyze...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008